perm filename VIS[00,BGB]1 blob
sn#069841 filedate 1973-11-09 generic text, type C, neo UTF8
COMMENT ā VALID 00013 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 2.0 Computer Vision Theory.
C00004 00003 2.1 Introduction to Computer Vision Theory.
C00011 00004 2.2 Related Work - State of the Art.
C00018 00005 2.3 Computer Vision Tasks.
C00022 00006 2.4 The Vision Cycle.
C00025 00007
C00027 00008
C00029 00009 2.5 The Nature of Images.
C00032 00010 2.6 The Nature of Worlds.
C00034 00011 2.7 Locus Solving.
C00036 00012 2.8 Grand Vision Theory.
C00039 00013 2.9 Summary of Arguments.
C00040 ENDMK
Cā;
2.0 Computer Vision Theory.
2.1 Introduction to Computer Vision Theory.
2.2 Related Work - State of the Art.
2.3 Computer Vision Tasks.
2.4 The Vision Cycle.
2.5 The Nature of Images.
2.6 The Nature of Worlds.
2.7 Locus Solving.
2.8 Grand Vision Theory.
2.9 Summary of Arguments.
2.1 Introduction to Computer Vision Theory.
Vision is the act or power of seeing. Computer vision
concerns programming a computer to do a task that demands the use of
an image forming light sensor, such as a television camera. Stated
in one sentence, my theory is that normal vision is a continuous
process of keeping an internal visual simulator in sync with
perceived images of the external reality, for the sake of some goal.
But to start at a logical beginning, I wish to postulate
the existence of the external physical universe, the existence of
images and image processing, the existence of internal mental
states, and the existence of visual tasks and goals; as they
commonly relate to computer technology. Also for the sake of
starting the discussion, vision systems can be described as a
mediating between perceived images and a world model. The two poles
(or operands) of the system are called the "bottom" for images and
the "top" for the models. The "world model" operand can be
identified even in vision systems that do not advertise it. Work
that truly lacks a world model is not computer vision, usually it
is image processing. Given the two classes of operands, images and
worlds; there are three operations: recognition, verification and
description; which a general vision system may perform.
Verification vision is also called top-down or model-driven
vision. The verification approach involves predicting an image,
followed by comparing the predicted image and a perceived image for
slight differences which are expected but not yet measure.
Recognition vision and descriptive vision are also called bottom-up
or data-driven vision. Recognition vision is qualitative, what is in
the picture is determined by extracting a set of features
(qualities) and by classifing them according to a essentially
statistical world model. Description vision is quantitative. Many
theories are superficially different in that they consist of
compounding the three basic modes of vision, or by using different
forms of the two basic elements: image and model.
In this chapter, several kinds of theory are presented.
There is general theory, which is my interpretation of the state of
the art of computer vision. There is the special theory, which
inspired this work and lead to the particular design choices I wish
to elaborate and defend. There are alternate theories and designs,
which are mentioned for the sake of contrast. Finally, I will
conclude by giving my world view of the ultimate nature of visual
perception, consciousness and intelligence. The word "theory", as
used here, means simply a set of statements presenting a systematic
view of a subject. Specifically, I wish to exclude the connotations
that the theory is a mathematical theory or a natural theory.
Perhaps there can be such a thing as an "artificial theory" which
extends from the philosophy thru the design of an artifact. An
artificial theory is validated by the successful design and
production of the intended artifact.
In early 1942, there were five ideas on how to manufacture
fissionable material for a bomb; three uranium isotope separation
techniques: electomagnetic, centrifuge and gaseous-diffusion; and
two plutonium reactor techniques: graphite and heavy water. In spite
of the considerable power of theory in nuclear physics, there was
no a priori way to select the best method; so all the theories were
tried, and three of the methods were made to work by 1945. Although,
several different theories of design may lead to the same ultimate
product; one theory is going to be the first to work, perhaps another
will work best, and perhaps yet a third will be the cheapest.
2.2 Related Work - State of the Art.
In many papers, Larry Roberts is justly credited for doing
the seminal work in Computer Vision; and although his thesis
appeared over ten years ago the subject has languished dependent on
and subsumed by the four areas called: Image Processing, Pattern
Recognition, Computer Graphics, and Artificial Intelligence. Thus I
will point out the relevant threads of computer vision in each of
these four subject areas.
(Computer vision and A.I.):
At one extreme, computer vision may be discribed as merely
the problem of getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities in its
memory, the rest of the problem is artificial intelligence. The
other extreme is harder to depict because it requires figuring where
to draw the line between vision software and intelligence software;
one goal I wish to pursue in this chapter is demark such a line.
Normal vision, as oppose to visual puzzles, is not an Artificial
Intelligence problem in the sense that it does not involve conscious
cognition; verbal abstraction; symbolism or self programming.
"The history of progress in the development of systems for automatic
symbolic integration poses an interesting question about the
definition of artificial intelligence. Few would argue that Slagle's
SAINT program was a product of artificial intelligence research.
Moses' SIN program for symbolic integration seldom needed to resort
to search, and for this reason some people consider it much more
powerful (intelligent ?) than SAINT. Now, Risch (1969) has developed
an algorithm for integrating many types of expressions. Risch
considers himself a mathematician, not an artificial intelligence
researcher. In your opinion should Risch's algorithm be considered
part of the subject matter of artificial intelligence ? If you would
exclude Risch from artifial intelligence, how would you respond to
the statement that every artificial intelligence program might
eventually be dominated by a (more intelligent?) non artificial
intelligence algorithm? If you would include Risch, would you also
include the long-division algorithm?"
- Nils J. Nilsson, problem 4-5;
Problem-Solving Methods in Artificial Intelligence.
(Fiegenbaum Quote).
[the relation between Artificial Intellegence, experiment,
environmental simulation].
"The design, implementation, and use of the robot hardware
presents some difficult, and often expensive, engineering and
maintenance problems. If one is to work in this area solving such
problems is a necessary prelude but, more often than not,
unrewarding because the activity does not address the questions of
A.I. reseach that motivate the project. Why, then, build devices?
Why not simulate them and their environment? In fact, the SRI group
has done good work in simulating a version of their robot in a
simplified environment. The answer given is as follows. It is felt
by the SRI group that the most unsatisfactory part of their
simulation effort was the simulation of the environment. Yet, they
say that 90% of the effort of the simulation team went into this
part of the simulation. It turned out to be very difficult to
reproduce in an internal representation for a computer the necessary
richness of environment that would give rise to interesting behavior
by the highly adaptive robt. It is easier and cheaper to build a
hardware robot to extract what information it needs from the real
world than to organize and store a useful model. Crudely put, the
SRI group's argument is that the most economic and efficient store
of information about the real world is the real world itself."
- E. A. Fiegenbaum [ref. X].
---------------------------------------------------------------------
The traditional subject of image processing involves the study and
development of programs that enhance, transform and compare 2D
images. Nearly all such image processing work can be subsumed into
computer vision.
2.3 Computer Vision Tasks.
2.3.1 Continuity Tasks: the Cart Task & the Turn Table Task.
2.3.2 Analysis of a Blocks picture.
2.3.3 Recognition Tasks.
Seemingly, the visual tasks are selected by the sponsors
or the admistrators of research, rather than being considered
a serious research question in its one right.
(Vision tasks).
The computer vision problem I wish to consider is to write a
program that can see and act with respect to the real physical
world. The interest of other researchers in modeling human
perception, in participating in traditional philosophical
arguments, in solving puzzle problems or in developing advanced
automation techniques must constantly be taken into account when
discussing computer vision.
Vision task that emphasive the continuity of the
visual process:
(cart task).
Given a computer controlled cart, explore and map the world.
(Cart Hardware Discription). The cart at the Stanford
Artificial Intelligence Laboratory is intended for outdoors use and
consists of four bicycle wheels, a piece of plywood, two car
battiers, a television camera, a television transmitter, and a
toy airplane radio receiver. (The vehicle being discussed is not
"Shakey", which belongs to the Stanford Reseach Institute's
Artificial Intelligence Group. There are two "Stanford-ish" A.I.
Labs and each has a computer controlled vehicle.) Logically the cart
has three motors which can be commanded to run in one or the other
direction under computer control. The six possible cart action
commands are: run forwards, run backwards, steer to the left, steer
to the right, pan camera to the left, pan camera to the right.
(turn table task). The turn table task in to construct
a 3-D model from a sequence of 2-D television images taken
of an object rotated on a turn table.
(blocks tasks).
The classic block vision task, dating from
Roberts, consists of two parts: first convert a video image
into a line drawing; second, find a selection of
prototype blocks that account for the line drawing.
[single image vs. multiple images].
[perfect line drawing puzzles: Guzman & Waltz].
[imperfect line drawing analysis]
(Recognition tasks).
2.4 The Vision Cycle.
The structure of any computer vision system can be analysed
as a mediator between perceived images and a world model.
The two poles of the vision transducer are called the "bottom" for
images and the "top" for models. Although I do not like the
vision-language analogy, I wish to adopt the top and bottom as
formal vision terminology, because it is concise and widely used.
Having established a top and a bottom, we can now introduce
those two jargon gems: top-down and bottom-up.
A notion characteristic of my approach is the observation that
computer vision is the inverse of computer graphics. The problem of
computer graphics is to synthesis images from three dimensional
models; the problem of computer vision is to analyze images into
three dimensional models.
The Vision Mandala.
1. PREDICT 2D ā 3D synthesis verification
2. PERCEIVE 3D ā 2D analysis revelation
3. COMPARE recognition
Three modes of operation on the vision cycle.
1. Revelation Vision - Data Driven Vision.
(nearly pure bottom up vision).
2. Verification Vision - Model Driven Vision.
(nearly pure top down vision).
3. Recognition Vision - feature classification.
(bottom up random access into existing top).
Vision.
Heuristic Vision - guess and test.
Accomodating Vision.
(first bottom-up, next top-down, then verify and correct).
---------------------------------------------------------------------
In my special theory, the vision transducer is:
1. Continuous rather than discrete.
2. Exact rather than fuzzy; numeric rather than symbolic.
3. Bidirectional rather than one way.
(Bidirectional).
The vision transducer has three possible modes:
verification, revelation and recognition.
Depending on circumstances, the vision transducer should be able to
run almost entirely top-down (verification vision) or bottom-up
(revelation vision). Verification vision is all that is required in
a well know and consquently predictible environment; whereas
revelation vision is required in a brand new or rapidly changing
environment.
(recognition)
Recognition involves comparing
perceived data with predicted data; such recognition comparing can
be done on any of the four types of 2-D images or the 3-D models.
Arcane recognition techniques can be avoided by improving the
prediction and the analysis so that matchs are nearly obvious.
2.5 The Nature of Images.
There are three basic kinds of information in a 2-D visual
image: photometric, geometric, and topological; also there are
three kinds of 2-D images: raster, contour, and mosaic.
The traditional subject of image processing involves the study and
development of programs that enhance, transform and compare 2D
images. Nearly all such image processing work can be subsumed into
computer vision.
---------------------------------------------------------------------
Assumption: The perceived images are low quality, black and white,
digitized television images.
Alternatives: 1. High quality electronic imaging device.
2. Film scanning system.
3. Active 3-D imaging device.
4. Non-light devices: sound, radar, neutrinoes, etc.
Discussion:
The argument in favor of using low quality, black and white,
television images is based on poverty rather than principle. Low
quality television is the cheapest electronic way to perceive an
image in real time.
Although, a super intellectual entities would have eyes that
could see the whole electromagnetic spectrum from gamma radiation to
direct current as well as "voices" that could broadcast on any and
all frequency; the video restriction
---------------------------------------------------------------------
An image contains three basic kinds of data:
topological data, geometric data, and photometric data.
The quality of the particular computer vision system
that one is condemned to use is a great influence one's
theoretical approach.
size of image
photometric accuracy, bits per pixel
resolution
speed of image taking
2.6 The Nature of Worlds.
The rules about the world that can be assumed a priori by a
programmer are the laws of physics; programming a Newtonian
simulator of the mundane physical world to a given approximation is
difficult but more fruitful than programming an Aristolean
simulator.
(Reality Simulation).
---------------------------------------------------------------------
Assumption: The world model should be a 3-D geometric model.
Alternatives: 1. Image memory and 2-D models.
2. Procedual Knowledge, e.g. Hewett & Winograd.
3. Semantic knowledge, e.g. Wilkes.
4. Formal Logic models, e.g McCarthy & Hayes.
5. Statistical world model, e.g. Duda & Hart.
Arguments:
---------------------------------------------------------------------
Assumption: Partial knowledge is represented by approxination.
Alternatives: 1. Tree of possibilties.
2. Multi valued logic.
3. Probablities.
Arguments:
---------------------------------------------------------------------
2.7 Locus Solving.
1. Camera Locus Solving.
2. Body Locus Solving.
Silhouette Cone Intersection.
Envelope bodies.
3. Sun Locus Solving.
(compute it, look at it, shine and shadows).
The crux of computer vision is to deduce information about
the world being viewed from images of that world. The world
information most directly relevant to vision is the physical
location, extent and light scattering properties of solid opaque
objects; the location, orientation and scales of the cameras that
takes the pictures; and the location and nature of the lights that
illuminate the world. Accordingly, three important vision problems
are camera solving, body solving, and sun solving.
The macroscopic world doesn't change very rapidly; between any two
world states there is an intermediate world state. Parallax is the
principal means of depth perception. Parallax is the alchemist that
converts 2-D images into 3-D models. Revelation vision is a process
of comparing percieved images taken in sequence and constructing a
3-D model of the unanticipated objects.
2.8 Grand Vision Theory.
"For the purpose of presenting my argument I must first explain the
basic premise of sorcery as don Juan presented it to me. He said
that for a sorcerer, the world of everyday life is not real, or out
there, as we believe it is. For a sorcerer, reality or the world we
all know, is only a discription. For the sake of validating this
premise don Juan concentrated the best of his efforts into leading
me to a genuine conviction that what I held in mind as the world at
hand was merely a description of the world; a description that had
been pounded into me from the moment I was born."
- Carlos Castaneda. Journey to Ixtlan.
The larger context of a vision theory depends on ones'
opinion about the nature of biological perception. It is my opinion
that mind is to matter, as computer software is to computer
hardware. That is mind is a program that is running in the brain.
Now what software can account for counsciousness, the inner private
life of the self that burns in our heads ? The so called stream of
counsciousness consists of little voice(s) talking, fragments of
music playing, and most important there is the flow of the here and
now. The here and now is the totality of the particular sights,
sounds, smells, and so on that are being played in your head in
sync with the respective sensory stimuli. So I believe that the
major computation being performed by an intellectual entity in order
to stay counscious of its external world is a reality simulation.
{mimicry arguments: for and against}.
2.9 Summary of Arguments.
1. Preference for continuous rather than discrete vision.
2. Preference for the descriptive approach rather than the
classification model.
3. Preference for working with real images rather than
with puzzle images.